Treewidth-based algorithms for the small parsimony problem on networks

Background Phylogenetic reconstruction is one of the paramount challenges of contemporary bioinformatics. A subtask of existing tree reconstruction algorithms is modeled by the Small Parsimony problem: given a tree T and an assignment of character-states to its leaves, assign states to the internal nodes of T such as to minimize the parsimony score, that is, the number of edges of T connecting nodes with different states. While this problem is polynomial-time solvable on trees, the matter is more complicated if T contains reticulate events such as hybridizations or recombinations, i.e. when T is a network. Indeed, three different versions of the parsimony score on networks have been proposed and each of them is NP-hard to decide. Existing parameterized algorithms focus on combining the number c of possible character-states with the number of reticulate events (per biconnected component). Results We consider the parameter treewidth t of the underlying undirected graph of the input network, presenting dynamic programming algorithms for (slight generalizations of) all three versions of the parsimony problem on size-n networks running in times \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$c^t {n^{O(1)}}$$\end{document}ctnO(1), \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$(3c)^t {n^{O(1)}}$$\end{document}(3c)tnO(1), and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$6^{tc}n^{O(1)}$$\end{document}6tcnO(1), respectively. Our algorithms use a formulation of the treewidth that may facilitate formalizing treewidth-based dynamic programming algorithms on phylogenetic networks for other problems. Conclusions Our algorithms allow the computation of the three popular parsimony scores, modeling the evolutionary development of a (multistate) character on a given phylogenetic network of low treewidth. Our results subsume and improve previously known algorithm for all three variants. While our results rely on being given a “good” tree-decomposition of the input, encouraging theoretical results as well as practical implementations producing them are publicly available. We present a reformulation of tree decompositions in terms of “agreeing trees” on the same set of nodes. As this formulation may come more natural to researchers and engineers developing algorithms for phylogenetic networks, we hope to render exploiting the input network’s treewidth as parameter more accessible to this audience.


Introduction
Molecular phylogenetic reconstruction consists in inferring a well-founded evolutionary scenario of a set of species from molecular data [1]. An evolutionary scenario, also called a phylogeny, is usually represented by a directed tree with a unique source called root. In a phylogeny, the tips of the tree are associated to extant species for which we have data, and each internal node represents an extinct species giving rise to new speciesa speciation. Therefore, each internal node represents the hypothetical ancestor of all species below it, and the root models the lowest common ancestor of all the species at the tips.

Open Access
Algorithms for Molecular Biology *Correspondence: mathias.weller@u-pem.fr

Parsimony on trees
In this paper, molecular data consists of a set of molecular sequences (e.g. DNA or protein sequences) of the same length (one sequence per species). This kind of data can be seen as a matrix M of n sequences, each having m characters (exhibiting one of c possible states) where M i,j corresponds to the state of the jth character exhibited by the ith species. There are several methods to reconstruct well-founded phylogenies from matrices of characters [1]. They are all based on the idea of retrieving similarities among species by comparing the states taken by these species at the different characters of M. Here, we will focus on parsimony methods. The main hypothesis of these methods is that character changes are not frequent. Thus, the phylogenies that best explain the data are those requiring the fewest evolutionary changes, i.e. the ones having the optimal parsimony score, formally defined in "Parsimony". The problem of finding the optimal parsimony score for a given phylogeny T with respect to an n × m matrix on a finite set of c character states is called the Small Parsimony problem and can be solved in O(n · m · c) time [2] since each column in the matrix can be analyzed independently in linear time. When T is unknown, the problem of finding the phylogeny minimizing the parsimony score is called the Big Parsimony problem. This latter is known to be NP-hard and numerous heuristic techniques for it are known [1].

Parsimony on networks
When the evolution of the species of interest include, in addition to speciations, reticulate events such as hybridizations or recombinations, a single species may inherit from multiple direct ancestors. In this case, the phylogenies are no longer represented by rooted trees but by rooted DAGs [3] called networks. When scoring a given network, three very different definitions of the parsimony score have been proposed: the hardwired [4], the softwired [5,6], and the parental parsimony score [7]. Roughly, the hardwired score takes into account all edges of the given network (characters are inherited from all parents), the softwired score takes only the edges of any "switching" (each character is inherited from one parent), and the parental score allows embedding lineages into the network (each allele of a character is inherited from one parent). See "Parsimony" for details and Fig. 1 for an example. While these definitions coincide for trees, they give rise to three different small parsimony problems for networks.
When tracing mutually dependent characters (e.g. different genomic locations in a same non-recombinant region) on networks, we also have to make sure that dependent characters are inherited from the same parent (some columns of the matrix have to use the same "switching"/"embedding"). To avoid dealing with this problem, the small parsimony problems on networks have been studied predominantly under the assumption of independent genomic locations. This boils down to having m = 1 since each column of the matrix can be analyzed independently (as is the case for the small parsimony problem on trees). Another popular restriction is to consider binary networks, in which the root has outdegree 2, tips have indegree 1, and internal nodes have either indegree 1 and outdegree 2 (speciations) or indegree 2 and outdegree 1 (reticulations).
The hardwired small parsimony problem has been proven NP-hard and APX-hard whenever the number of states that a character can take, denoted c, is strictly greater than 2, and polynomial-time solvable for binary characters [8]. A polynomial-time 1.35-approximation for all c and a 12 11 -approximation for c = 3 have been proposed [8]. Additionally, the problem has been shown fixed-parameter tractable (FPT) in the parsimony score [8, 2 p · O(min(q 2 3 , √ z) · q) time], and in c + r [9, O(n · c r+2 ) time], where n, q, z are the number of leaves, vertices and edges in the phylogenetic network and p and r are the hardwired parsimony score and the number of reticulate events in the network.
The softwired small parsimony problem is also NPhard and APX-hard [8,10] for binary characters, and not FPT in the parsimony score (it is NP-hard to decide if the softwired parsimony score is 1). Also, it has been shown that, for any constant ǫ > 0 , no n 1−ǫ approximation can be computed in polynomial time, unless P = NP . On the positive side, the problem is FPT in c + r [6, 8, O(2 r · n · c) time] and c + ℓ [8,11, O(2 ℓ · c 2 · q · z) time], where ℓ is the maximum number of reticulations over all biconnected components of the network (also called the level of the network).
Unsurprisingly, the parental small parsimony problem has also been proven NP-hard, even for very restricted classes of networks, but it is FPT both with respect

Treewidth for phylogenetic networks
The treewidth of a graph can roughly be described as a measure of "tree-likeness" and it ranks among the smallest of such parameters [14] (in particular, the treewidth can be seen to be smaller than the level ℓ on any network). Together with the fact that it facilitates the design of dynamic programming algorithms, this explains the enormous popularity the treewidth received in the parameterized complexity community [15,16]. Starting with the groundbreaking work of Bryant and Lagergren [17] (using the celebrated result of Courcelle [18]), treewidth also gained traction with researchers studying algorithms for phylogenetics-related problems (surveyed in [19]). While this yielded some algorithms parameterized by the treewidth of the display graph of multiple trees (the result of "gluing" all trees at their leaves), we are not aware of any algorithms parameterized by the treewidth of the input network. In an attempt to facilitate the use of this parameter in future work, we dedicate Sect. "An alternative formulation of treewidth" to presenting a "phylogenetics-friendly" formulation by representing tree-decompositions of the input network as a rooted tree Ŵ on the same vertex set as the network. In particular, this formulation generalizes our previously considered parameter "scanwidth" [20], which can be seen as a variant of treewidth that takes directness into account. While we expected scanwidth-based dynamic programming formulations to be easier and more straight-forward than their treewidth-counterparts, this comes at the cost of the scanwidth being potentially arbitrarily larger than the treewidth. Intuitively speaking, we expect scanwidth dynamic programming to be easier since phylogenetic networks exhibit a "natural flow of information": most often, we know everything about the leaves, but the more we approach the root, the more information has to be inferred from the lower parts. In contrast to the scanwidth-layout, tree-decompositions disregard edge directions and, thereby, this "natural flow". Thus, while using the scanwidth allows for more naïve and intuitive dynamic programming formulations, using the treewidth requires more care and ingenuity.
Since we will suppose that a (not necessarily optimal) tree-decomposition of the input network is given in the input, let us discuss the current state-of-the-art for computing good decompositions. Optimal decompositions are indeed very hard to compute, with even the best-known parameterized algorithm being considered impractical (see survey [15]). This gloomy cloud has, however, two silver linings: First, if we do not insist on optimality, then we can use a recently published algorithm to compute 2-approximated tree-decompositions in 2 O(k) n O(1) time [21]. We will state our results in a way that allows plugging-in any algorithm that computes or approximates tree decompositions. Second, with development driven by recent instances of the PACE challenge [22], more practical exact algorithms to compute tree decompositions are now available as well [23]. Herein, the running times of Tamaki's implementation [23] are hard to predict and show erratic behavior even for fixed graph size. As expected, however, examples for high running times occur only for instances with high treewidth, that is, for "highly tangled" networks (see Fig. 2 for two select examples). This hints towards some hidden properties of the input networks that govern the complexity of treewidth computations As we expect "natural networks" to be only moderately tangled, we think that existing algorithms, exact and approximative, are currently well-enough developed to deal with real world phylogenetic networks in reasonable timeframes. Indeed, we would welcome efforts similar to those made for the treewidth to also be made for the previously discussed scanwidth, which is also hard to compute [20].
For ease of presentation, the three main proofs (correctness of the dynamic programming formulations) are Scornavacca and Weller Algorithms for Molecular Biology (2022) 17:15 given as high-level sketches and their more detailed and formal versions can be found in the appendix.

Mappings
For any x and y, we define δ (x, y) to be 0 if x = y and 1, otherwise, and we abbreviate 1 − δ (x, y) =: δ (x, y) . We further abbreviate δ (φ(x), φ(y)) as δ φ (x, y) for any function φ . We may denote a pair (x, y) as x → y if it is referring to an assignment of y to x by some function and as xy if it refers to an arc in a network. We sometimes use the name of a function φ : X → Y to refer to its set of pairs {x → y | φ(x) = y} and we let φ | Z := {(x → y) ∈ φ | x ∈ Z} denote the restriction of φ to Z. We say φ(x) = ⊥ to indicate that φ is not defined for x. We denote the result of forcing φ(x) = y (whether or not x is mapped by φ ) as Finally, for sets Z, X and Y ⊆ X and functions φ and ψ , we write ψ φ (and say that ψ is a subfunction of φ ) if (a) φ : X → Z and ψ : Y → Z and ψ(x) ≤ φ(x) for all x ∈ Y , or (b) φ : X → 2 Z and ψ : Y → Z and ψ(x) ∈ φ(x) for all x ∈ Y , or (c) φ : X → 2 Z and ψ : Y → 2 Z and

Graphs and phylogenetic networks
In this work, we consider directed acyclic graphs (DAGs) N that may have a unique source ρ N called root. If the sinks (aka leaves) of N are labeled, we call N a phylogenetic network. We refer to the nodes and directed edges (arcs) of N by V(N) and A(N), respectively. The underlying undirected graph of N is the undirected graph on node-set V(N) that contains an edge {u, v} if and only if N contains the arc (u, v). As we do not deal with mixed graphs, we use the term uv to refer to the arc from u to v or the undirected edge between u and v, depending on the context. We refer to the edge-set of an undirected graph G as E(G).
We denote the set of nodes of a DAG N with in-degree at least two by R(N) and we call such nodes reticulations. If R(N ) = ∅ , then N is called a tree. The result of, for each v ∈ R(N ) removing all but one of its incoming arcs is called a switching of N and S(N ) denotes the set of all switchings of N (observe that all switchings are spanning trees). For each v ∈ V (N ) , we denote the successors (or "children") of v in N by Succ N (v) and its predecessors (or "parents") by Pred N (v) . If N contains a directed u-w-path, then we say that w is a descendant of u and u is an ancestor of w (denoted as w ≤ N u and w < N u if u = w ). A set Z ⊆ V (N ) such that u < N w and w < N u for all u, w ∈ Z is called an anti-chain in N. The induced subgraph N[Z] of a set Z ⊆ V (N ) is the result of Tamaki's tree-decomposer [23] has a harder time with the right, more tangled instance (50 nodes, 175 edges, treewidth 25 computed in 79s) than with the larger instance on the left (465 nodes, 1004 edges, treewidth 9 computed in 0.5s), illustrating that tangledness is a more important factor than size. Indeed, both instances display a tangledness that already exceeds what we expect to see in real-world phylogenetic networks. The instances are ex065 (right) and ex011 (left) of the PACE2017 challenge [22] Page 5 of 31 Scornavacca and Weller Algorithms for Molecular Biology (2022) 17:15 removing all nodes x ∈ V (N ) \ Z from N (together with their incident arcs) and, for any v ∈ V (N ) , the network

An alternative formulation of treewidth
In this section, we give an alternative definition of the treewidth, which allows to tackle the small parsimony problem for networks in a simpler and more intuitive way. Note that this alternative definition is known in the FPT community (Dendris et al. [24] call it the "support" of a vertex with respect to an ordering while, when referring to Arnborg [25]) and Mescoff et al. [26], call it "tree vertex separation"). However, since in these works its connection to treewidth is mostly touched in passing, we felt the need to prove it explicitly here. Since tree decompositions are agnostic to edge directions, all results in this section are stated for undirected graphs G instead of networks N,. Keeping in mind that the framework is to be applied to phylogenetic networks, all examples will be made with DAGs while, for the sake of versatility, all results are stated for undirected graphs. The reader may simply ignore the edge directions in the examples as all undirected graphs will be underlying undirected graphs of some DAGs.
For a linear ordering σ of the nodes of an undirected graph G and any x ∈ V (G) , we write y ≤ σ x for all nodes y preceeding x in σ (including x itself ) and let σ [1..x] denote the restriction of σ to these nodes. We write x G,σ y if x and y are connected in G[σ [1..x]] (see Fig. 3 for an example). Note that G,σ is a partial order on V(G). We consider nodes outside σ [1.
.v] that have an edge to the . We denote these nodes by ZW σ v and their number by zw σ v .

Definition 1
Let σ be a linear order of the nodes of an undirected graph G and let v ∈ V (G) . Then, We abbreviate zw(σ ) := max v zw σ v and zw(G) := min σ zw(σ ) and we refer to the transitive reduction of the directed graph (V (G), {uv ∈ V (G) 2 | u G,σ v}) as the canonical tree Ŵ σ of σ for G (we will see below that Ŵ σ is a rooted tree; see Fig. 3).
In the following, we say that a rooted tree Ŵ on V(G) agrees with an undirected graph G if, for all uv ∈ E(G) either u < Ŵ v or v < Ŵ u . We also extend the definition of G,σ to such trees by writing u G,Ŵ v if u and v are con- In analogy to Definition 1, G,Ŵ gives rise to a set YW Ŵ v containing the nodes "above" v in Ŵ that have a edge in G to a node "below" v in Ŵ. Fig. 3) Let G be an undirected graph and let Ŵ agree with G. For each v ∈ V (G) , we define Then, we abbreviate yw(Ŵ) := max v yw Ŵ v and yw(G) : = min Ŵ yw(Ŵ).

Definition 2 (see
Note that the path P resulting from traversing σ from right to left is a rooted tree agreeing with G. However, yw(P) is expected to be large for this choice. Indeed, we

Fig. 3
Example of a network N (left) with a linear order σ of its nodes (below) as well as their canonical tree Ŵ σ (right) whose arcs are not drawn (the arcs of N are drawn in their stead). Reticulations are black, leaves are boxes. For the first (wrt. σ ) reticulation x, the set V (Ŵ σ x ) is marked (gray area) and equals σ [1..x] in this example. Further, the arcs in A x (N) are dotted and the nodes in YW Ŵ x = ZW σ x are gray pentagons. Note that x N,σ ρ N but neither can show that the most "refined" trees Ŵ have the smallest yw(Ŵ).

Lemma 1 Let Ŵ and Ŵ ′ be rooted trees agreeing with an undirected graph G and let
The following lemma proves a number of interesting properties relating σ and Ŵ σ such as Ŵ σ being a rooted tree whose descendant relation is a refinement of ≤ σ , culminating in the equality of ZW σ x and YW Ŵσ x for all x.
Lemma 2 Let σ be a linear order of the nodes of a connected undirected graph G and let Ŵ σ be its canonical tree. Then, y contains a neighbor of x in G. (j) Each x ∈ V (G) has at most as many children in Ŵ σ as it has neighbors in G.
Proof (a), (b): We show for all vertices w on a u-vpath p in Ŵ σ that w ≤ σ u and u G,σ w . The base case w = u holds trivially. For the induction step, let q preceed w in p.
Since Ŵ σ contains the arc qw, Definition 1 implies q G,σ w and, since q ≤ σ u by induction hypothesis, w ≤ σ q ≤ σ u and u G,σ w . For the reverse direction of (b), note that, by Definition 1, uv is an arc of the DAG of which Ŵ σ is the transitive reduction.
that Ŵ σ is connected and rooted at r. (e): To prove that Ŵ σ is a tree, assume there is a vertex x ∈ V (G) with two distinct parents y and z in Ŵ σ . Without loss of generality, let y < σ z . By (b), y G,σ x and z G,σ x , implying that σ [1.
.y] contains a y-x-path p y in G and σ [1.
.z] contains a z-x-path p z in G. Since σ [1.
.z] the concatenation of p z with (the reverse) of p y is a path in G whose nodes are in σ [1.
Then, z ≤ σ x < σ y . By (b), z ≤ Ŵ σ x and, by (f ), z ≤ Ŵ σ y . Thus, as Ŵ σ is a tree (by (e)), x and y are not unrelated in Ŵ σ . Moreover, y σ x implies y Ŵ σ x by (b) and, thus, x < Ŵ σ y . Together with z ≤ Ŵ σ x and yz ∈ E(G) , this implies y ∈ YW Ŵσ x . (i) By (b), G contains an x-y-path p whose vertices are in σ [1..x] and, thus, x G,σ v for all vertices v on p. We show u ≤ Ŵ σ y for all u on p except x, starting with the obvious y ≤ Ŵ σ y . Then, this implies that the second vertex on p, which is a neighbor of x in G, is in Ŵ σ y . Let v ≤ Ŵ σ y be a vertex on p and let u be the predecessor of v in p. If u = x then we are done, so suppose u = x . Further, by (f ), either u < σ Ŵ v ≤ σ Ŵ y , implying the claim directly, or v < σ Ŵ u , implying that u is on an x-v-path in Ŵ σ . By (e) there is only one such path and it starts with (x, y, . . .) and, since u = x , this implies u ≤ σ Ŵ y. (j) is immediate from (i) combined with (e).
In order to show that zw(G) and yw(G) coincide, we need to "normalize" some aspects of the structure of agreeing trees. To this end, we use the following operation on rooted trees which can be interpreted as contracting a set of unwanted nodes upwards. Formally, for a rooted tree T and for X ⊂ V (T ) that does not contain the root r of T, we let T ↑ X denote the result of (1) replacing each arc uv with uv ∩ X = {u} with the arc wv where w is the lowest ancestor of u that is not in X, and (2) removing all nodes in X from T. Note that T ↑ X may have strictly larger out-degree than T, but does not create new ancestor-descendant relations.
Observation 1 Let T be a tree, let X ⊆ V (T ) not contain its root, and let u, v ∈ V (T ↑ X) with u ≤ T ↑X v . Then, u ≤ T v.

Lemma 3
Let Ŵ be a rooted tree agreeing with an undirected graph G. Then, there is some rooted tree Ŵ * agreeing with G such that yw(Ŵ * ) ≤ yw(Ŵ) and, for all Proof Let u ∈ V (G) such that . We will modify Ŵ into Ŵ ′ with yw(Ŵ ′ ) ≤ yw(Ŵ) such that Ŵ ′ agrees with G and the relation ≤ Ŵ ′ is a strict subset of ≤ Ŵ . To this end, note that u has a parent w in Ŵ as, otherwise, G[Ŵ u ] = G , implying X = ∅ . Then, Ŵ ′ results from Ŵ by (see Fig. 4) 1. replacing Ŵ by Ŵ ↑ (Ŵ u \ X) and 2. dangling Ŵ u ↑ X from w.
First, we show that Ŵ ′ agrees with G. To this end, let xy ∈ E(G) and let x and y be unrelated in Ŵ ′ . If neither x nor y are in Ŵ u then, by construction of Ŵ ′ , they are also unrelated in Ŵ , contradicting that Ŵ agrees with G. So, without loss of generality, suppose x ≤ Ŵ u . Since xy ∈ E(G) and Ŵ is a tree agreeing with G, we thus know that u and y are not unrelated in Ŵ . If u < Ŵ y , then w ≤ Ŵ y and, thus, x ≤ Ŵ ′ y . Thus, suppose y ≤ Ŵ u . Clearly, if x, y ∈ X or x, y / ∈ X , then x and y are also unrelated in Ŵ , contradicting its agreement with G. Thus, without loss of generality, suppose x ∈ X and y / ∈ X , that is, and u G,Ŵ y , contradicting xy ∈ E(G).
Second, we show that ≤ Ŵ ′ is a strict subset of ≤ Ŵ . To this end, let xy ∈ A(Ŵ ′ ) and assume towards a contradiction that y < Ŵ x . Clearly, if x Ŵ ′ w , then xy ∈ A(Ŵ) contradicting y < Ŵ x . Further, if x = w , then either y ∈ X or y is a child of w in Ŵ , all of which imply y < Ŵ x . Thus, x < Ŵ ′ w . Since xy ∩ X = {x} or xy ∩ X = {y} contradicts xy ∈ A(Ŵ ′ ) , we have x, y ∈ X or x, y / ∈ X . But then, y < Ŵ x by Observation 1. Thus, ≤ Ŵ ′ is a subset of ≤ Ŵ and it is strict since we have v ≤ Ŵ u and v Ŵ ′ u for all v ∈ X � = ∅.

Lemma 4
Let Ŵ be a tree agreeing with a graph G and let p be a non-empty path in G. Then, p contains a unique maximum u with respect to Ŵ, that is, v ≤ Ŵ u for all vertices v of p.
Proof Let x on p be maximal with respect to Ŵ (that is, for all z on p, we have x < Ŵ z ) and assume towards a contradiction that there is another vertex y = x on p that is maximal w.r.t. Ŵ . Without loss of generality, let x precede y in p and let p xy denote the unique x-y-subpath of p. Since y Ŵ x , there is an edge st ∈ E(G) on p xy with s ≤ Ŵ x and t Ŵ x . Hence, t Ŵ s . Further, s Ŵ t since, otherwise, the unique t-s-path in Ŵ contains x, contradicting its maximality. But then Ŵ does not agree with G.
"≤ ": Let Ŵ be some rooted tree agreeing with G such that yw(Ŵ) = yw(G) . By Lemma 3, we may assume Let σ be any ordering of V(G) obtained by repeatedly picking and removing any leaf of Ŵ.
The rooted trees Ŵ , Ŵ ′ , and Ŵ * are drawn with thick, gray lines. Thin, black lines are edges of G. For the indicated node u, the black nodes are in X, that is, they are below u in Ŵ but not connected to u in G[Ŵ u ] . By Lemma 4, p has a unique maximum w in Ŵ . Hence, v ≤ Ŵ w and, by " .v] , we also have w ≤ σ v . Thus, v = w and, since u ∈ V (p) , we have u ≤ Ŵ w = v by maximality of w.
To prove the lemma, we show YW Ŵ x , that is y > σ x and there is some z ∈ σ [1..x] with yz ∈ E(G) and x G,σ z . By Claim 1, z ≤ Ŵ x . Further, as yz ∈ E(G) and Ŵ agrees with G, y and z are not unrelated in Ŵ and, since z ≤ Ŵ x , neither are x and y. Since y < Ŵ x implies y < σ x by Claim 1, contradicting y > σ x , we conclude x < Ŵ y . Together with z ≤ Ŵ x and yz ∈ E(G) , this implies y ∈ YW Ŵ x . Having shown that the notion of zw(G) and yw(G) are equivalent, we can now turn our attention to the treewidth. In particular, we introduce (nice) tree-decompositions and use their properties to show that the treewidth of any undirected graph G equals yw(G). Fig. 5) Let G be an undirected graph and let T be a rooted tree whose vertices are associated to subsets of V(G) by a function B :

Definition 3 (see
We call (T, B) a tree decomposition of G and its width is We call (T, B) nice if T is binary and all x ∈ V (T ) fall into one of the following categories As stated at the beginning of the section, recall that, while tree decompositions are defined for undirected graphs, we may talk about tree decompositions of DAGs, meaning tree decompositions of their underlying undirected graphs. Note that all graphs G have a nice tree decomposition with |V (T )| ∈ O(tw(G) · |G|) and width tw(G) [27]. Further, since all bags of (T, B) containing a vertex v of G are connected, we can observe the following.
Observation 2 Let (T, B) be a nice tree decomposition for an undirected graph G and let v ∈ V (G) . Then, T contains a single "forget v"-node x and y < T x for all y with v ∈ B(y). Proposition 1 Let G be an undirected graph. Then, yw(G) = tw(G). Further, given a tree decomposition (T, B) for G, we can compute a tree Ŵ agreeing with G such that yw(Ŵ) = tw(T , B) in linear time.
Proof "≤ ": Let (T, B) be a nice tree decomposition for G of width tw(G) and let F ⊂ V (T ) denote the set of all "forget"-nodes in T (noting that F contains the root of T). We define Ŵ as the transitive reduction of First, we show that Ŵ agrees with G. To this end, let uv ∈ E(G) and let f u , f v ∈ F denote the unique "forget u" and "forget v"-nodes in T, which are distinct since T is nice. By Definition 3(a), there is a node q ∈ V (T ) with u, v ∈ B(q) and, by Observation 2, q < T f u , f v . Thus, f u and f v are not unrelated in T and, thus, neither in Ŵ. Second, we show for all v ∈ Ŵ and the unique "for- Let f u and f w be the unique "forget u" and "forget w"-nodes in T, which are distinct since T is nice. Then, w ≤ Ŵ v < Ŵ u and, since f u , f w ∈ F , we also have f w ≤ T f v < T f u . Since uw ∈ E(G) , Definition 3(a) implies that there is a node q of T with u, w ∈ B(q) and, by Observation 2, First, to prove Definition 3(a), let uv ∈ E(G) . Since Ŵ agrees with G, either u < Ŵ v or v < Ŵ u . Without loss of generality, suppose the latter.
, it suffices to prove u ∈ B(x) for all x ∈ V (p) (since v has been chosen arbitrarily, a path with these properties exists for all v ′ with u ∈ B(v ′ ) , so they all contain the node u and are, thus, connected). For

Parsimony
Notation Large parts of this work are in context of a rooted tree Ŵ on the node set V(N) of a given phylogenetic network N (see Fig. 6). Specifically for the tree Ŵ , we permit ourselves to abbreviate V (Ŵ x ) to Ŵ x to increase readability. In such context, we additionally define the following sets for any nodes y, z ∈ V (N ) : Pred ↑y N (z) := Pred N (z) ∩ Ŵ y and Pred ↓y N (z) := Pred N (z) \ Ŵ y denote the respective predecessors of z in N that are or are not in Ŵ y . Likewise, Succ ↑y N (z) := Succ N (z) ∩ Ŵ y and Succ ↑y N (z) := Succ N (z) \ Ŵ y denote the respective successors of z in N that are or are not in Ŵ y -note that the arrow in the notation indicates the direction of the arc between z and the members of the set when drawing Ŵ top-down. If z = y , we drop y and simply write  For brevity, we abbreviate A X (N ) . Introduction to Parsimony Given states of a character, observed in extant species, as well as a species phylogeny, the small parsimony problem asks to infer states of the same character for all ancestral species such as to minimize the "parsimony score" of this assignment. This problem comes in three flavors called "hardwired", "softwired", and "parental" parsimony. Throughout this section, let C be a fixed finite set (a "character"). For convenient use of the -relation, let C be an anti-chain (that is, for each x, y ∈ C , we have x ≤ y only if x = y ). Formally, for a phylogeny N and a function φ : V (N ) → 2 C , we define the hardwired and softwired parsimony score as The "parental parsimony" is defined using "parental trees" but, in this work, we use the equivalent formulation using lineage functions [12].

Definition 4 A lineage function for a phylogeny N is any function
Given N and a function φ : V (N ) → 2 C , we denote the set of all lineage functions f on N with f φ as LF N ,φ . Finally, the parental parsimony score is For each of the presented variants, we give a dynamic programming formulation using a given tree Ŵ that agrees with the undirected graph G underlying the input network and corresponds to Lemma 3, that is, each nonleaf x of Ŵ has a child v with x ∈ YW Ŵ v . The running time of the resulting algorithm will depend on the width yw(Ŵ) of Ŵ (recalling that yw(Ŵ) coincides with the treewidth of G for optimal Ŵ).
As stated in the introduction, in this paper we focus on the case of analyzing a specific position in the genome. Since the function φ can associate several states to a same leaf, our definition permits to describe polymorphism in a population. While in our current formulation the algorithms "choose" an optimal state to associate to each leaf, the parental parsimony can be easily modified to explain all states of each leaf at the end of the run. This allows keeping the information on polymorphism in all steps of the algorithm (see "Parental parsimony"). Note also that φ can associate information to internal nodes, thus permitting the user to impose restrictions on the states associated to ancestral species.
In the presentation of the dynamic programming, a x 2 are independent of one another, allowing an implementation to forget Q y 1 x 2 still is. In the following, for an anti-chain Y in Ŵ and a class G of subnetworks of N, a Y-substitution system of G is a series of sub- is also in G . Roughly, we can "swap out" the arcs in A y (N ′ ) for A y (N y ) for each y ∈ Y without loosing membership in G . Note that the N y are not necessarily distinct, so a trivial Y-substitution system for {N ′ } would be (N ′ ) y∈Y . The formulations are based on the following lemma about independent sub-solutions, showing that an optimal solution (S, ψ) for a sub-network (of G) "below" an antichain Z in Ŵ is also optimal on any sub-network "below" an anti-chain Y in Ŵ that is itself "below" Z (among all solutions with ψ 's behavior on y∈Y YW Ŵ y ).
Let G be a class of subnetworks of N and let S ∈ G and ψ : V (N ) → C such that (a) z∈Z uw∈A z (S) δ ψ (u, w) is minimum among all such S and ψ. Let (S y ) y∈Y be a Y-substitution system for G and let ψ y : V (N ) → C for each y ∈ Y such that (b) ψ y and ψ coincide on YW Ŵ y . Then, Proof Towards a contradiction, assume that the lemma is false. We construct ψ * : Note that ψ * and ψ coincide with ψ y on YW Ŵ y for all y ∈ Y . Thus, δ ψ * (u, w) = δ ψ y (u, w) if uw ∈ A y (S * ) for any y ∈ Y and δ ψ * (u, w) = δ ψ (u, w) , otherwise. Further, we construct a digraph S * := (V (N ), (A(S)\ y∈Y A y (S)) ∪ y∈Y A y (S y )) which is in G since (S y ) y∈Y is a Y-substitution system for G . Since all S y are subnetworks of N, we know that Ŵ agrees with S * . Furthermore, contradicting optimality of S and ψ (that is, Lemma 6(a)) since S * ∈ G.

Hardwired parsimony
To compute the hardwired parsimony score at a node v of N, we require knowledge of the character assigned to v and its neighbors. For all u ∈ YW Ŵ v , we thus "guess" the character ψ(u) assigned to u by an optimal assignment. In our dynamic programming, we scan Ŵ can be calculated as follows.
. Then, we define a table entry Fig. 7 Lemma 6 proves that any solution (S, ψ) that is optimal on sub-trees rooted at Z in Ŵ must also be optimal (among all solutions with ψ 's behavior on y∈Y YW Ŵ y (gray box on top)) on all sub-trees of Ŵ that are rooted below Z (at Y). That is, no solution (S y , ψ y ) can be better than (S, ψ) on the sub-network induced by Ŵ y for any y ∈ Y . To prove this, a new solution (S * , ψ * ) is constructed by replacing the sub-solution of (S, ψ) below Y by the sub-solutions (S y , ψ y ) below Y Then, " ≥ " follows from optimality of ψ on A x (N ).
For " ≤ ", it suffices to show that the cost of ψ on A x (N ) is equal to the result of setting c x := ψ(x) in the right hand side of (3) (which is a valid choice for the minimum since In order to solve the hardwired parsimony problem given N, φ and Ŵ , all we have to do is compute T HW [x, ψ x ] for each x bottom-up in Ŵ and each of the (at most) Proposition 1 lets us turn tree decompositions of N into trees Ŵ agreeing with N, allowing us to replace yw(Ŵ) by tw(N ) , incurring an additional running time of

Corollary 1 Let (N , φ) be an instance of Hardwired
Parsimony. Let t ≥ tw(N ) and let T be the time in which a width-t tree decomposition of N can be computed. Then, the hardwired parsimony score of (N , φ) can be computed

Softwired parsimony
In contrast to the hardwired parsimony score, where the computation of the cost of the incident edges of a node x only required knowledge of the characters assigned to neighbors of x, computing the softwired score additionally requires knowledge of which parent of x remains a parent in the sought switching. A table entry T SW [x, . . .] contains the smallest combined cost of all arcs in A x (S) for a switching S of N minimizing this cost. To be able to compute an entry for x ∈ V (N ) , we not only need to "guess" ψ x but, additionally, some representation of the switching S. In particular, in S, no child of x may have another parent than x. However, since children of x in N may be above x in Ŵ , we have to "guess" which children of x in N are still children of x in S. Such a guess manifests itself as an additional index R x of the dynamic programming table (note that we clearly only have to store this information for children of x that are reticulations). Indeed, this information has to be stored for all nodes considered below x who still have children in YW Ŵ x . Thus, we index our DP-table also by a subset R x ⊆ YW Ŵ x ∩ R(N ) containing a reticulation r ∈ R(N ) if and only if Ŵ x contains a parent v of r and vr is an arc of an optimal switching S for

Definition 6
Let Ŵ be a tree that agrees with N, let where In the following, for any anti-chain X in Ŵ and all such that equality holds in (5). We consider a switching S ′ ∈ S Z i →R ′ constructed from switchings S i−1 ∈ S Z i−1 →R ′ \R * and S * ∈ S Ŵ v i →R * as well as a mapping ψ ′ coinciding with the cost of ψ i−1 is optimal on A Z i−1 (S i−1 ) and (d) the cost of ψ * is optimal on A v i (S * ) . By induction (4) Then, " ≤ " follows from the fact that R * is only one of the possible choices for the minimum in (5).
For " ≥ ", let c x ∈ φ(x) and R * ⊆ R x ∩ Succ R↑ N (x) be such that equality holds in (4). We consider a switching S ′ ∈ S Ŵ x →R x constructed from switchings S t and S * with , and S * ∈ S {x}→R * , as well as a mapping ψ ′ coinciding with ψ x on YW Ŵ x constructed from mappings ψ t and ψ * such that (a) ψ t coincides with ψ x , (c) ψ * (x) = c x , (d) the cost of ψ t is optimal on A Z t (S t ) and (e) the cost of ψ * is optimal on A {x} (S * ) . Then, the cost of ) and, by the claim above, the cost of ψ t on . Then, as S ′ ∈ S Ŵ x →R x , " ≥ " follows by optimality of S and φ.
For " ≤ ", let c x := φ(x) and let R * := Succ R↑ S (Ŵ x ) . We use independence of sub-solutions and the induction hypothesis to show that the cost of φ on In order to solve the softwired parsimony problem given N, φ and Ŵ , all we have to do is compute . Then, by Lemma 8, the softwired parsimony score of N with respect to φ can be read from T SW [ρ Ŵ , ∅, ∅] . In the following, let ψ x be fix. Then, for fix c x , we can compute Q for all x and R x , we have to check |V(N)| choices for x, as well as |φ(x)| ≤ |C| choices for c x and 3 |Succ R↑ N (x)| choices for R x and R * ⊆ R x combined. Altogether, the table T SW can be computed in time is absorbed by this. For practical purposes, note that estimating |Succ R↑ N (x)| ≤ |YW Ŵ x | is quite crude and equality will almost never be attained. Then, the following result holds: Theorem 2 Given a network N, φ : V (N ) → 2 C and a tree Ŵ agreeing with N, the softwired parsimony score of (N , φ) can be computed in O(|C| yw(Ŵ) · (3 yw(Ŵ) · |C| · |V (N )| + |A(N )|)) time.
Again, we can replace yw(Ŵ) by tw(N ) using Proposition 1. (N , φ) be an instance of Softwired Parsimony. Let t ≥ tw(N ) and let T be the time in which a width-t tree decomposition of N can be computed. Then, the softwired parsimony score of (N , φ) can be computed in O(T + |C| t · (3 t · |C| · |V (N )| + |A(N )|)) time.

Parental parsimony
For ease of presentation, we introduce some additional notation. First, for any a and b, we abbreviate max{a − b, 0} =: a . −b . Let ψ and ψ ′ be functions. If ψ maps all items to ∅ or to 0, then we say that ψ is a zero-function and we write ψ = − → 0 . We use ψ − ψ ′ to denote the function defined on the domain of , otherwise. This definition extends to functions mapping to sets in a natural way.
Each finite-cost lineage function f corresponds to a phylogenetic tree "embedded" in N whose branches are called lineages (see Fig. 1(right)). For each x ∈ V (N ) , f(x) represents the set of such lineages passing through x. Each such lineage may "choose" a parent among the parents of x in N. This models the biological circumstance that a character trait may be inherited from any parent. We compute (the cost of) an optimal lineage function on N using a tree Ŵ that agrees with N. To compute cost f (x) , we require knowledge of y∈Pred(x) |f (y)| as well as y∈Pred(x) f (y) (see Definition 4). We partition the predecessors of x over which the formula iterates into those above x in Ŵ and those below (since Ŵ agrees with N, all predecessors of x in N are comparable to y in Ŵ ). For all y ∈ YW Ŵ x , we thus store We will compute table entries for x using the already computed table entries for the children v i of x in Ŵ . In these lookups, we have x ∈ YW Ŵ v i so, to be consistent with the semantics, we have to make sure that (x) = U , ψ(x) = D , and that all lineages of x that are not inherited from Pred Further, each child y of x in N may inherit a lineage from x and, if y is above x in Ŵ , this has to be registered by removing the lineages of U from ψ(y) and subtracting |U| from η(y) . Finally, the lineages represented by ψ and η are distributed among the children of x in Ŵ using the table Q. In the following, in order to avoid treating the case that x = ρ N separately, we define ρ(x) := 1 − δ (x, ρ N ) , that is, ρ(x) = 1 if and only if x = ρ N . Definition 7 Let Ŵ be a tree that agrees with N, let Note how the table Q x distributes the lineage branches of x whose parents are in Ŵ x among the children of x in Ŵ . We show that both T PT and Q x are monotone in ψ and η (wrt. ).

Lemma 10 Let Ŵ be a tree agreeing with N, let
and η x (w) ≤ u∈Pred N ↑x(w) |f (u)|. If there are no such f, . From f i−1 and f * , we construct a lineage function f ′ ∈ LF N ,φ whose cost on Z i is j<i u∈Ŵv j cost f i−1 (u) + u∈Ŵv i cost f * (u) . Then, " ≥ " follows by optimality of f i on Z i . For " ≤ ", let ψ ′ and η ′ be such that, for all |f i (u)| . By independence of subsolutions, f i is optimal on Z i−1 and on Ŵ v i so, by induction hypotheses, the cost of Since ψ ′ and η ′ are only one of the possible choices for the minimum in (8), " ≤ " follows.
For " ≥ ", let D ⊆ U ⊆ φ(x) such that equality holds in (7). We construct a lineage function f ′ that assigns f ′ (x) = U and such that the lineages of D are inherited from parents of x (in N) that are below x in Ŵ . To this end, we ask the dynamic programming table for the cost of a lineage function that is optimal on Z t and such that 1.
to inherit |U| lineages in total: | x (u)| come from every parent u of x in YW Ŵ x while the rest has to be inherited from Ŵ x ) and 4. η ′ (w) = η x (w) .

−|U | for all w ∈ Succ
. −|U | satisfy the conditions of Claim 3, the optimal cost of such a lineage function Further, the cost of f ′ on x is the number of lineages in U that is not inherited "for free" from parents of x, that is, For " ≤ ", let U := f (x) and let D : be the set of lineages of U that are inherited from parents of x in N that are below x in Ŵ . By independence of sub-solutions, f is optimal on Z t so, by Claim 3, its cost on Z t is Q x [t, ψ ′ , η ′ ] where ψ ′ := ψ x [. . .] and η ′ := η x [. . .] are defined as in (7) and its cost on f (x))| . Then, " ≤ " follows from the fact that U and D are only one of the possible choices for the minimum in (7).
To solve the parental parsimony problem given N, φ and Ŵ , we compute . . , |C|} (by Definition 7, no value larger than |C| ever enters η x and all modifications to η x decrease the mapped-to values). To this end, Q x [i, ψ, η] is computed for each x, i, , ψ , and η by making at most x,c x and T PT . As there are O(|A(N)|) valid combinations of x and i, the table Q can be computed Again, we can replace yw(Ŵ) by tw(N ) using Proposition 1. (N , φ) be an instance of Parental Parsimony. Let t ≥ tw(N ) and let T be the time in which a width-t tree decomposition of N can be computed. Then, the parental parsimony score of (N , φ) can be computed in O(T + 6 t·|C| · 4 t·log |C| · |A(N )|) time.

Corollary 3 Let
Note that the parental parsimony setting supports assigning multiple states of a character to a single species, thereby modeling species carrying multiple alleles of a single gene. By forcing x is a leaf, we can trivially modify our dynamic programming to explain multiple character states in extant species.
Corollaries 1, 2 and 3 give the running times of our algorithms as depending on the treewidth of N. The state-of-the-art solutions for Hardwired Parsimony, Softwired Parsimony and Parental Parsimony have the following respective running times: O(|C| r+2 |V (N )|) [9], O(2 ℓ |C| 2 |V (N )||A(N )|) [8] and O(|2 C | ℓ+3 |V (N )|) [12]. Since the scanwidth of N is potentially much smaller than its level ℓ [28], and the treewidth of N is smaller than its scanwidth [20], we have tw(N ) − 1 ≤ ℓ ≤ r . Thus, we expect that there will be several cases where our algorithms will be faster than the current best-known ones.

Discussion
In this paper, we focused on the small version of the parsimony problem for networks given a specific position in the genome. When markers can be assumed to be independent, as it is the case when a certain distance is preserved between genomic locations included in the matrix, each position can be analyzed separately, and the parsimony score of a network w.r.t. the matrix is simply the sum of the parsimony scores of the network for each genomic location. Thus, the algorithms presented here can be easily expanded to several independent genomic locations. Moreover, our formulations are defined for networks that are not necessarily binary, can account for polymorphism and can impose restrictions on ancestral states. As discussed above, our algorithms can be orders of magnitude faster than the state-of-the-art solutions.
A comparison of the reticulation number, the level, the scanwidth and the treewidth for practically relevant classes of networks would thus be an interesting project for future work. Our results are slightly overshadowed by the fact that optimal tree decompositions are very hard to compute. However, practical exact and approximative algorithms are available today and we expect them do perform well, as phylogenetic networks can be expected to only be moderately tangled.
paper by Bachoore and Bodlaender [29], considering tree decompositions minimizing a weight function over the bags.
The ability to fast-score phylogenetic networks under the parsimony framework could be a big help in designing likelihood-based heuristics or bayesian methods to infer networks from independent markers [28,30] by providing fast heuristics to compute the initial networks with which to start the likelihood or bayesian search, or to design fast local-search techniques.
In the future, we would like to tackle the small parsimony problem for several dependent genomic locations (e.g. a gene). Little is known for this problem, except that it stays NP-hard even for binary characters on level-1 networks [31] and that it is fixed-parameter tractable in the number of reticulations of the network [6]. Another important direction would be to study the big parsimony problem, which is currently wide open, even lacking a consensus of the definition of optimality [6,[32][33][34].
The proof is by induction on the height of x in Ŵ . For the induction base, suppose that x is a leaf in Ŵ and note that A x (N ) = A {x} (N ) in this case. Then, (3) simplifies to Since ψ(x) ∈ φ(x) , we know that ψ(x) participates in the minimum in (9), implying the " ≤"-direction. For the " ≥"direction, assume that T HW [x, ψ x ] < uw∈Ax(N ) δ ψ (u, w) . By (9), there is some c x = ψ(x) with c x ∈ φ(x) and N ). For the induction step, suppose that t > 0 and consider both directions separately.
Furthermore, closer inspection of our dynamic programming formulations (most prominently Definition 6) unveils that their computation is faster when the maximum number of reticulations in each bag is small. Thus, it would be interesting to be able to compute tree decompositions in which this quantity is low, to the point where one could improve running time of the algorithm by sacrificing optimality of the decomposition in favor of reducing this "reticulation density". Research in this direction is, to the best of our knowledge, limited to a Then, by Lemma 6 (with Z = {x} , Y = {v i } , G = {N } and (S y ) y∈Y = (N ) y∈Y ), optimality of ψ on A x (N ) implies optimality of ψ i on A v i (N ) . Thus, we can use the induction hypothesis on T HW [v i , ψ i ] . Since ψ(x) participates in the minimum of (3), "≥ ": Assume towards a contradiction that the lemma is false, that is, "<" holds. By (3), there is some c x ∈ φ(x) such that Since c x ∈ φ(x) , we can extend ψ x [x → c x ] to V(N) without violating φ , that is, there are functions Since, by assumption, T HW [x, ψ x ] is strictly less than the cost of ψ on A x (N ) , we conclude that the cost of ψ ′ on A x (N ) is strictly less than that of ψ , contradicting optimality of ψ.
is minimum among all such S and ψ. Then, Proof Note that arcs that are incoming to tree nodes cannot be switched off and, thus, Succ S ′ (z) for all z ∈ V (N ) and all switchings S ′ ∈ S(N ) . The proof is by induction on the height of x in Ŵ.

Case 1:
x is a leaf in Ŵ , that is, t = 0 . First, note that R x ⊆ Succ R↑ N (x) and no r ∈ R x ⊆ R(N ) can have all their parents in Ŵ x = {x} , thus implying S x→R x (N ) � = ∅ . Next, let y be the predecessor of x in S and note that y ∈ Pred ↓ N (x) = Pred N (x) . Further, y minimizes δ ψ (y, x) among all y ∈ Pred N (x) as, otherwise, we can construct a new switching S ′ ∈ S Ŵ x →R x (N ) by replacing yx by some y ′ x with y ′ ∈ Pred N (x) , thereby contradicting (b). Clearly, and there is some c x ∈ φ(x) such that equality holds if ψ(x) = c x . Let ψ * := ψ[x → c x ] be the result of changing the assignment of x to c x in ψ and note that ψ x ψ * . Clearly, we still have S ∈ S Ŵ x →R x (N ) . Thus, Case 2: x has children v 1 , v 2 , ..., v t in Ŵ . Recall that we suppose that x ∈ i≤t YW Ŵ v i by Lemma 3. For all S * ∈ S(N ) and all anti-chains Y in Ŵ , abbreviate S Y → y∈Y Succ R↑ S * (Ŵ y ) (N ) =: S Y ,S * (N ) , that is, roughly, the set of switchings of N with the same "behavior" as S * on Y. The proof of Case 2 relies on the independence of partial solutions established by Lemma 6 with G = S Y ,S * (N ) . To apply Lemma 6, we show that any set of switchings S y such that {Succ is a Y-substitution system for S Y ,S * (N ).

Claim 4 Let S * ∈ S(N ) and let Y be an anti-chain in
≥ uw∈A x (S) δ ψ (u, w) , it is sufficient to show that S ′ ∈ S(N ) . Towards a contradiction, assume there is a node w ∈ V (N ) − ρ N that does not have exactly one parent in S ′ and let u * be the parent of w in S * . Clearly, for each y ∈ Y , we have w / ∈ Ŵ y as, otherwise, First, suppose w has no parent in S ′ . Then, u * w ∈ y∈Y A y (S * ) that is, u * ∈ Ŵ y for some y ∈ Y , but w / ∈ A y (S y ) . But since S y ∈ S(N ) , we know that w has a parent in S y (which is not u * since w / ∈ A y (S y ) ), implying that w is a reticulation in N. Thus, . But then, S y ′ contains an arc uw ∈ A y ′ (S y ′ ) which is in S ′ by construction, thus contradicting w having no parents in S ′ .
Second, suppose that w has at least two distinct parents u and u * in S ′ and note that, again, w is a reticulation in N. Since S * is a switching, at least one of them, say u, is such that uw ∈ y∈Y A y (S y ) . However, since the Succ R↑ S y (Ŵ y ) are disjoint and each S y is a switching, we cannot have u * w ∈ y∈Y A y (S y ) . Thus, u * w ∈ A(S * ) \ y∈Y A y (S * ) .
However, since y∈Y Succ R↑ S * (Ŵ y ) = y∈Y Succ R↑ S y (Ŵ y ) , we Page 20 of 31 Scornavacca and Weller Algorithms for Molecular Biology (2022) 17:15 know that uw ∈ A S * (Ŵ y ) for some y ∈ Y . But then, w has two parents in S * contradicting S * ∈ S(N ).
In the following, we prove the semantics of the table Q ψ x x,c x . For all i ≤ t , abbreviate 1≤j≤i Ŵ v j =: Z i . w) is minimum among all such S i and ψ i and Proof The proof is by induction on i, noting that Furthermore, S 1 , ψ 1 , and R ′ satisfy the conditions of the lemma for v 1 , so we can employ the induction hypothesis of the lemma. Thus, By induction hypotheses (of the claim and the lemma), there are switchings S i−1 and S ′ of N with Succ it is sufficient to show that S i can be turned into a switching of N without changing Succ R↑ S i (Z i ) . To this end, suppose that there is a node w = ρ N of N that does not have exactly one parent in S i . Since S i−1 and S ′ are switchings, w has parents u i−1 and u ′ in S i−1 and S ′ , respectively. If w has no parent in S i , then u i−1 w ∈ A v i (S i−1 ) and u ′ w / ∈ A v i (S ′ ) and, thus, Then, we can just add the arc u ′ w to S i without changing Succ R↑ S i (Z i ) . If w has at least two parents, then u i−1 and u ′ are both parents of w in S i , that is, and, thus, u ′ < Ŵ v i < Ŵ u i−1 , implying u ′ � = u i−1 as well as w ∈ YW Ŵ v i and w ∈ R * . But then, we can remove u i−1 w from S i without changing Succ R↑ S i (v i ) . Repeating this argument, we can turn S i into a switching of N with Succ For the second part of the claim, we show both inequalities separately.
"≤ ": Let S i ∈ S Z i →R ′ (N ) and ψ i : is minimum among all such S i and ψ i . Further, let are finite by induction hypotheses. Then, as R * ⊆ R ′ ∩ Succ R↑ N (Ŵ v i ) , we know that R * participates in the minimum of (5). Thus, x,c x [i, R ′ ] is infinite, so suppose it is finite. By (5), there is some First, since , R * ] � = ∞ , the induction hypothesis (of the lemma) guarantees that there is some S * ∈ S Ŵ v i →R * (N ) and ψ * : , and ψ ′ (a) := ψ i (a) , otherwise. Note that ψ ′ φ . Further, ψ i and ψ i−1 coincide on YW Ŵ Z i−1 and, thus, ψ ′ and ψ i−1 coincide on all nodes touched by . Further, ψ i and ψ * coincide on YW Ŵ v i and, thus, ψ ′ and ψ * coincide on all nodes touched by Having established the semantics of Q ψ x x,c x , we can finish proving Case 2 of Lemma 8s. First, consider the case that S Ŵ x →R x (N ) = ∅ and assume that T SW [x, ψ x , R x ] � = ∞ . By Eq. (4) and Claim 5, there is some c x and Let S ′ be a switching in one of these sets and note that Succ , then y ∈ R * and S ′ contains an arc zy for some z / ∈ Ŵ x , implying that we can swap zy for xy in S ′ without affecting Succ R↑ S ′ (Z t ) or S ′ being a switching. Thus, we can assume without loss of generality that Succ In the following, we thus assume that S Ŵ x →R x � = ∅ and we show both directions of the lemma separately.
"≤ ": Let c x := ψ(x) ∈ φ(x) , let R * := Succ R↑ S (x) , and note that R * = Succ R↑ S (x) ⊆ Succ R↑ S (Ŵ x ) = R x . Further, let y := Pred S (x) be the parent of x in S. Since Ŵ agrees with N (and, thus, with S) we know that either and, by Claim 5, (12)  Then, since c x and R * are valid choices for the minima in (4), we have "≥ ": Suppose that T SW [x, ψ x , R x ] � = ∞ as, otherwise, this direction is trivial. We consider each case of the minimum in (4) individually (although both cases are analogous).

Case 2.1: Pred
↓ N (x) � = ∅ and there are c x ∈ φ(x) and R * ⊆ R x ∩ Succ R↑ N (x) such that By Claim 5, there is some S ′ ∈ S Z t →R x \R * (N ) and some w) is minimum among all such S ′ and ψ ′ and (13) (4), (12), (13) ≤ r∈R * ∪Succ From S ′ we construct a switching S * ∈ S Ŵ x →R x (N ) by 1. swapping each arc zr ∈ A(S ′ ) with r ∈ R * for xr (which exists in N since R * ⊆ Succ R↑ N (x) ), 2. swapping each arc xr ∈ A(S ′ ) with r / ∈ R x for an arc zr with z / ∈ Ŵ x (which exists in N since S Ŵ x →R x (N ) � = ∅ ), and 3. swapping the arc yx ∈ A ↓ {x} (S ′ ) with an arc zx ∈ Pred ↓ N (x) × {x} minimizing δ ψ ′ (x, z) . Since this operation does not change the in-degree of any node, S * is still a switching of N and we have Succ

Case 2.2: Pred
↑ N (x) � = ∅ and there are c x ∈ φ(x) and is minimum among all such S ′ and ψ ′ and We construct a switching S * ∈ S Ŵ x →R x (N ) by 1. swapping each arc zr ∈ A(S ′ ) with r ∈ R * forxr (which  17:15 exists in N since R * ⊆ Succ R↑ N (x) ) and 2. swapping each arc xr ∈ A(S ′ ) with r / ∈ R x for an arc zr with . Since this operation does not change the in-degree of any node, S * is still a switching of N and we have Succ Proof Note that the inequality on Q x trivially holds if Q x [i, ψ, η] = ∞ and, similarly for T PT . The proof is based on the observation that the transformations done to ψ and η in Equations (7) and (8) are monotone.
The following functions (acting on functions) are montone The proof for g U ,D is completely analogous.
With Claim 6, we can show that monotonicity of Q x implies monotonicity of T PT .

Claim 7
Let v 1 , v 2 , . . . , v t be the children of x in Ŵ and suppose that Q x is monotone. Then, T PT is monotone.
such that the minimum in Equation (7) in Definition 7 is attained, that is, for some constants c U ,D and c * U ,D that are independant of φ and η . Since, by assumption, Q x is monotone for all and both f U ,D and g U ,D are monotone by Claim 6, we conclude Note the last " ≥ " since we only know that this particular value participates in the minimum that forms T PT [x, , ψ ′ , η ′ ] , while this minimum may be attained at an even smaller value.
By Claim 7, in order to prove Lemma 9, it is sufficent to show that Q x is monotone. This proof is by induction on the height of x in Ŵ and the value of the first argument i of Q x .
For the induction base, suppose that x is a leaf of Ŵ and note that x has t = 0 children.

Lemma 10
Let Ŵ be a tree agreeing with N, let x ∈ V (N ), let ψ x , x : YW Ŵ x → 2 c and η x : YW Ŵ x → N. Let f minimize cost (f ) among all lineage functions in LF N ,φ such that, for all w ∈ YW Ŵ x , there are no such f, then In this case, the table entry is assigned the cost If x = ρ N , this simplifies to |U | − 1 and, since |f (ρ N )| = 1 , the cost is minimized by U = f (ρ N ) and the table entry equals 0 = cost f (ρ N ) . Thus, in the following, let x = ρ N . "≤ ": Since (20) is satisfied for U = f (x) , the minimum over all U is at most the cost when choosing "≥ ": Towards a contradiction, assume that there is a U satisfying (20) such We show that f ′ := f [x → U ] has less overall cost than f, contradicting its optimality. Since changing f(x) to U only influences the cost of x and its children in N, it suffices to consider them. To this end, let y be any child of x in N. (20). Finally, for each y ∈ Succ N (x), Case 2: x has children v 1 , v 2 , . . . , v t with t ≥ 1 in Ŵ . In the following, we abbreviate Y i := j≤i YW Ŵ v j and Z i := j≤i Ŵ v j . Further, we call a lineage function f ′ eligible with respect to an anti-chain Y in Ŵ and functions ′ , ψ ′ , and η ′ if, for all w ∈ y∈Y YW Ŵ y , we have and |f (u)| + ρ(w) and the cost of f ′ is finite on y∈Y Ŵ y . We first show how the table Q x is used to distribute lineages among the v i .
The proof of the claim is by induction on i.
, ψ ′ , η ′ ] is finite, that is, by induction hypothesis of the lemma, there is a lineage function f ′ that is eligible for Y 1 , , ψ 1 = ψ ′ and η 1 = η ′ . Thus, the first part of the claim follows. Since ψ 1 and η 1 are the only valid choices for the minima in (8) that result in finite values, we conclude since f i is eligible with respect to Y 1 , , ψ 1 and η 1 and minimizes z∈Z 1 cost f i (z) − ρ(x).
ψ i−1 := ψ i − ψ ′ and η i−1 := η i − η ′ . By induction hypotheses, there are functions f i−1 and f ′ such that f i−1 is eligible with respect to Y i−1 , , ψ i−1 , η i−1 and f ′ is eligible with respect to {v i } , , ψ ′ , η ′ . We construct a function f * by setting (Note that the cost of f on N might be ∞ but we will see that its cost on Z i is finite). First, we show that f * is eligible with respect to Y i , , ψ i , and η i . To this end, let w ∈ YW Ŵ y for any y ∈ Y i . Then, by eligibility of f ′ and f i−1 Finally, the cost of f * on Z i equals the cost of f i−1 on Z i−1 plus the cost of f ′ on Ŵ v i and is, therefore, finite. Thus, f * is eligible for Y i , , ψ i and η i , implying the contraposition of the first part of the lemma. For the cost equality, we consider both directions separately. "≤ ": Let ψ ′ : YW Ŵ v i → 2 C and η ′ :  |f i (u)| + ρ(w)}.   Towards a contradiction, assume that this value is strictly smaller than z∈Z i cost f i (z) − ρ(x) . By the induction hypothesis of the lemma, there is a lineage function f ′ that is eligible with respect to {v i } , , ψ ′ , and η ′ with the induction hypothesis of the claim, there is a lineage function f i−1 that is eligible with respect to Y i−1 , , . We construct a lineage function f * by setting By eligibility of f i−1 , f i and f ′ , we know that f i−1 , f i and f * coincide with on y∈Y i−1 YW Ŵ y and f ′ , f i and f * coincide with on YW Ŵ v i . To contradict optimality of f, it thus suffices to show that f * is eligible with respect to Y i , , ψ i , and η i , To this end,note that, for all w ∈ y∈Y i YW Ŵ y , we have as well as  Having established the equality for Q x , we can now prove the lemma for i > 1 . For the first part, suppose that T PT [x, x , ψ x , η x ] � = ∞ . By (7), there are D ⊆ U ⊆ φ(x) such that By Claim 8, there is a lineage function f t that is eligible for {v t } , t := x [x → U ] , ψ t , and η t . Without loss of generality, suppose that f t (w) = t (w) for all w ∈ (YW Ŵ x ∪ {x}) \ y∈Y t YW Ŵ y . In particular, f t (x) = t (x) = U and f t has finite cost on Z t .
|f t (x)| > u∈Pred N (x) |f t (u)| . In the first case, n t (x) = |U | > 1 , contradicting η t (x) ≤ ρ(x) . In the second case, n t (x) = |U | Thus, f t is eligible with respect to {x} , x , ψ x and η x , implying the first part of the lemma. For the second part, we consider the directions seperately.
We show that f t is eligible with respect to {x} , x , ψ x and η x . First, assume that cost f t (x) = ∞ , that is, either x = ρ N and |f t (x)| = |U | > 1 or x = ρ N and "≥ ": We pick up the definition of f t and show that T PT [x, x , ψ x , η x ] ≥ z∈Ŵ x cost f t (z) . Then, " ≥ " follows from optimality of f on Ŵ x . Indeed, and, since ψ t (x) = D ⊆ f t (x) ∩ f (u) x (u))| = cost f (x) + ρ(x) . Further, let We show that f is eligible with respect to Y t , x , ψ t and η t . Then, Q x x→D z∈Z t cost f (z) − ρ(x) + cost f (x) + ρ(x) = z∈Ŵ x cost f (z) since U and D are valid choices for the minimum in (7).
To see that f is eligible, note that f (w) = x [x → U ] for all w ∈ y∈Y t YW Ŵ y since y∈Y t YW Ŵ y ⊆ YW Ŵ x ∪ {x} . Further, for the conditions on ψ t and η t , consider three cases for nodes in y∈Y t YW Ŵ y . First, if w = x , then Second, if w ∈ y∈Y t YW Ŵ y ∩ Succ   Otherwise, w ∈ y∈Y t YW Ŵ y \ (Succ ↑ N (x) ∪ {x}) and we have